Experiences from Developing the Domain-Specific Entity Search Engine GeneView

نویسندگان

  • Philippe E. Thomas
  • Johannes Starlinger
  • Ulf Leser
چکیده

.GeneView is a semantic search engine for the Life Sciences. Unlike traditional search engines, GeneView searches indexed documents not only at the textual (syntactic) level, but analyzes texts upon import to recognize and properly handle biomedical entities, relationships between those entities, and the structure of documents. This allows for a number of advanced features required to work effectively with scientific texts, such as precise search despite large numbers of synonyms and homonyms, entity disambiguation, ranking of documents by entity content, linking to structured knowledge about entities, user-friendly highlighting of recognized entities etc. As of now, GeneView indexes approximately ~21,4 million abstracts and ~358.000 full texts with more than 200 Million entities of 11 different types and more than 100,000 relationships of three different types. In this paper, we describe the architecture underlying the system with a focus on the complex pipeline of advanced NLP and information extraction tools necessary for achieving the above functionality. We also discuss open challenges in developing and maintaining a semantic search engine over a large (though not web-scale) corpus.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

GeneView: a comprehensive semantic search engine for PubMed

Research results are primarily published in scientific literature and curation efforts cannot keep up with the rapid growth of published literature. The plethora of knowledge remains hidden in large text repositories like MEDLINE. Consequently, life scientists have to spend a great amount of time searching for specific information. The enormous ambiguity among most names of biomedical objects s...

متن کامل

Community curation for GeneView

1 Motivation The latest discoveries of diseases and their diagnosis or treatments have been mostly published in scientic literature. The fast growth of published biomedical articles led to a strong ambiguity of disease names meaning a traditional keyword-based search for biomedical articles will not lead to satisfying results [DL12]. This problem does not only exist for the terms of diseases, i...

متن کامل

Query Architecture Expansion in Web Using Fuzzy Multi Domain Ontology

Due to the increasing web, there are many challenges to establish a general framework for data mining and retrieving structured data from the Web. Creating an ontology is a step towards solving this problem. The ontology raises the main entity and the concept of any data in data mining. In this paper, we tried to propose a method for applying the "meaning" of the search system, But the problem ...

متن کامل

Towards Supporting Exploratory Search over the Arabic Web Content: The Case of ArabXplore

Due to the huge amount of data published on the Web, the Web search process has become more difficult, and it is sometimes hard to get the expected results, especially when the users are less certain about their information needs. Several efforts have been proposed to support exploratory search on the web by using query expansion, faceted search, or supplementary information extracted from exte...

متن کامل

Building a Semantic Web Search Engine: Challenges and Solutions

Current web search engines return links to documents for user-specified keywords queries. Users have to then manually trawl through lists of links and glean the required information from documents. In contrast, semantic search engines allow more expressive queries over information integrated from multiple sources, and return specific information about entities, for example people, locations, ne...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2013